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An asymptotic maximum principle 
for essentially linear evolution models 

Abstract. Recent work on mutation-selection models has revealed that, under specific 
assumptions on the fitness function and the mutation rates, asymptotic estimates for the 
leading eigenvalue of the mutation-reproduction matrix may be obtained through a low- 
dimensional maximum principle in the limit N — > oo (where iV is the number of types). In 
order to extend this variational principle to a larger class of models, we consider here a fam- 
ily of reversible N x N matrices and identify conditions under which the high-dimensional 
Rayleigh-Ritz variational problem may be reduced to a low-dimensional one that yields the 
leading eigenvalue up to an error term of order 1/N. For a large class of mutation-selection 
models, this implies estimates for the mean fitness, as well as a concentration result for the 
ancestral distribution of types. 



1. Introduction 

Many systems of population biology or reaction kinetics may be cast into a form 
where individuals (or particles) of different types reproduce and change type in- 
dependently of each other in continuous time. If the types come from a finite set 
S and the population is so large that random fluctuations may be neglected, one is 
led to a linear system of differential equations of the form 

y = yH (D 

I s 

with initial condition y(0). Here, y = (y.;)ies G ^->o holds the abundance of the 
various types; H = (Hij)ij£s is an | *S>* |x | *S" | matrix, which represents a linear oper- 
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ator on M' s l . The main application we have in mind here is in population genetics, 
where types are alleles, so that Equation Q is a haploid mutation-reproduction 
model; but one may also think of a compartment model, where types are locations 
of a certain chemical. Important examples of the analogous discrete-time dynam- 
ics include models of age-structured populations, which are often referred to as 
matrix population models, see Caswell's monograph 1131 . In line with large parts 
of the population genetics, and most of the stochastics, literature, we will use the 
convention that y is a row vector to which H is applied from the right, so that Hij 
(i 7^ j) is the coefficient for the change from i to j. 

We will assume throughout that the linear operator H generates a positive 
semigroup, {cxp(iiJ) | t ^ 0}. Since S is finite, this is equivalent to ^ 
for i j. The flow so generated leaves ML^ invariant. We will further assume 
that H is irreducible (i.e., if G(H) is the directed graph with an edge from i to j 
if i 7^ j and > 0, then there is a directed path from any vertex to any other 
vertex). 

We will often use the decomposition 

H = M + R (2) 

into a Markov generator M and a diagonal matrix R. More precisely, we have 

M = (Mij)i tje s with Mij := for i ^ j, M vi := - J2jeS\{i} M H ( so 

T,j&5 M *3 = °)« and R = d ™gi R i I * e S] With R i : = H H - M H- 

Clearly, the decomposition in (|2ji is unique, and M is irreducible iff H is, because 
G(M) = G(H). Mij is the rate at which an i-individual produces j-offspring 
(j ^ i), and Ri is the net rate at which individuals of type i reproduce themselves; 
this may also include death terms and thus be negative. 

Solutions of cannot vanish altogether (unless y(0) = 0), since tr(iJ) is 
finite, hence det (exp(tH)) = exp(ttr(if)) > and ker( exp(iiJ)) = {0}, for 
all t ^ 0. Therefore, we may also consider the corresponding normalized equation 
for the proportions pi := yi/{J2j£S fj)> wn i cn i s often more relevant. Clearly, 

Pi = ^2 Pj m ji + fa ~ zJ / ''//'')''' ■ 

jes jes 

In the population genetics context, this is the mutation-selection equation for a 
haploid population, or a diploid one without dominance; for a comprehensive re- 
view of this class of models, see 1101 . It is well known, and easy to verify, that the 
way back from Q to Q is achieved through the transformation 1461 

y{t) :=p(t)exp ( Vfl, f Pj(r)dr) . 

This substitution can thus be viewed as a global linearization transformation and 
explains why Q is an 'essentially linear' equation. In fact, Eq. Q appears in a 
variety of contexts. In particular, its discrete-time relative may be used to describe 
the dynamics of the age structure of a population, compare II II Ch. 4]. Due to its 
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frequent appearance, a better understanding of Eq. Q and its solutions is the main 
motivation for the present work. 

Clearly, the solution of (0 is obtained from that of Q through normalization: 

y(t)=y(0)cMtH), p(t) - 



Of course, proportions of types in a population that grows without restriction 
(which is biologically reasonable only over short time scales) do not represent 
the only way in which may arise. Actually, the same equation for p results if 
([0 is replaced by 

V = y{H->y{t)), 

where ^(t) is some scalar (possibly nonlinear) function which describes the elimi- 
nation of individuals by population regulation. This is obvious from the invariance 
of under Ri — > B4 + j(t) if performed simultaneously for all i. The func- 
tion j(t) may, for example, describe an additional death term caused by crowding, 
which may depend on t through y, but acts on all types in the same way. 

Eq. may be read in two ways (cf. |28|). If mutation and reproduction go 
on independently of each other, the parallel (or decoupled) version is adequate. 
Here, every i-individual gives birth to offspring of its own type at rate £?j, dies at 
rate Di, and mutates to j at rate My (J ^ i). Then B4 := Bi — Di is the net 
reproduction rate or Malthusian fitness 1151 Ch. 5.3], and Eq. is immediate. 
If, however, mutation is a side effect of reproduction (through copying errors of 
the replication process, for example), the coupled version is more relevant 1 11251 . 
When an i-individual reproduces (which it does, as before, at rate Bi, while it dies 
at rate Di), the offspring is of type j with probability V%j Vij — !)■ This leads 
to 

jes jes 

where, again, R4 = Bi — Di . But if we set My : = Bi (Vij — Sij ) , we arrive again at 
Eq. (0. In both cases, Y^j RjPj is the mean fitness of the population. Obviously, a 
mixture of both the parallel and the coupled mutation mechanisms can be tackled 
in a similar way. Furthermore, the decoupled model arises as the weak-selection 
weak-mutation limit of the coupled one 1281 . or of the corresponding model in 
discrete time p. 98]. 

The model © also arises in the infinite population limit of the well-known 
Moran model with selection and mutation, see 1191 Ch. 3] or 1161 p. 126]. This is a 
stochastic model where, in a population of m individuals, every individual of type 
i reproduces at rate Bi, and the offspring, which is of type j with probability Vij, 
replaces a randomly chosen individual in the population (possibly its own parent). 
To describe the entire population, let Zi (t) be the random variable that gives the 
number of i-individuals at time t, and Z(t) — (Zj(f)) f g . Hence, if Z(t) = z, 
and j ^ k, we can have transitions from z to z + ej — e&, where Cj denotes the 
unit vector corresponding to j. Such a transition occurs at rate ^\ BiVijZiZk/m. 
Let us look at the influence of increasing m, whence we write Z^ m \t) to indi- 
cate dependence on system size. As m — > 00, the sequence of random processes 
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Z( m >(t)/m converges pointwise almost surely, and even uniformly for every fi- 
nite interval [0, t], to the solution of the differential equation l|4} with Di = 0, 
and initial condition Z^ m \Q)/m (resp. its limit as m — ► oo), compare 1181 Thm. 
11.2.1]. 

The linear equation Q has a more direct stochastic interpretation in terms of 
a continuous-time multitype branching process. After an exponential waiting time 
with expectation t^, an individual of type i produces a random offspring with a 
finite expectation of bij children of type j (we will not specify the distribution 
explicitly since we will not fully develop the stochastic picture here). The matrix 
H with Hij = (bij — Sij ) /Vj then is the generator of the first-moment matrix. That 
is, if Zj (t) is again the (random) number of individuals of type j at time t, and E 1 
the associated expectation in a population started by a single i individual at time 
0, then 

E i (Z j (t)) = (exp(tH)).., (5) 

Furthermore, with the identification j/,(t) = E(Zi(t)), Equation Q then simply is 
the forward equation for the expectations. (See or 1321 for the general context of 
multitype branching processes, and £26] for the application to mutation-selection 
models.) 

Important first questions concern the asymptotic properties of the systems dis- 
cussed. A key to these properties is the leading eigenvalue, A max , of H (i.e., the 
real eigenvalue exceeding the real parts of all other eigenvalues). If, on short time 
scales, unrestricted growth according to Q is relevant, then A max is the asymp- 
totic growth rate of the population (and is related to the chance of ultimate sur- 
vival). The stationary distribution of types in (0 is given by the left eigenvector 
of H corresponding to A max . We will call it the present distribution of types, as 
opposed to the (less well-known, but equally important) ancestral distribution that 
is obtained by picking individuals from the present distribution and following their 
ancestry backward in time until a new stationary state is reached. This ancestral 
distribution is given by the elementwise product of the left and right eigenvectors 
of H corresponding to A max , with proper normalization |29 30 1. The knowledge 
of A max is a prerequisite for the calculation of these eigenvectors. In the population 
genetics context, the present distribution is often referred to as mutation-selection 
balance, with A max as the mean fitness. Finally, and perhaps most importantly, 
the dependence of A max on certain model parameters is of great interest. For ex- 
ample, a lot of research has been directed towards the question of how the mean 
fitness changes when the mutation rate increases (i.e., when M is varied by some 
nonnegative scalar factor), and interesting effects have been observed, for exam- 
ple so-called error thresholds. They may be defined as non-analytical changes of 
Amax that occur when the mutation rate surpasses a critical value, in analogy with 
a phase transition in physics. This is accompanied by a discontinuous change in 
the ancestral distribution, as well as pronounced changes in the present distribution 
of types; see 1101 Ch. Ill] and 1171 for general reviews, 1261 for recent results and 
a classification of the various threshold phenomena that may occur, and 1241 for a 
recent application to the evolution of regulatory DNA motifs. 
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In general, exact expressions for eigenvalues are hard to obtain if |5| is large 
but fixed. In recent work on mutation-selection models, however, scalar or low- 
dimensional maximum principles for the leading eigenvalue have been identified 
for certain examples in a suitable continuous limit as |5| — ► oo, see 12612 II . It is 
the purpose of this paper to generalize these results to a large class of operators. 
We will do so under the general assumption that the Markov generator M is re- 
versible, which means that the equilibrium flux from state i to state j is the same 
as that from j to i. This entails that the mutation process is the same in the forward 
and backward direction of time, and covers many of the frequently-used models 
in classical population genetics, for example, the house-of-cards model, and the 
random-walk mutation model with Gaussian mutant distribution (see ifTol Ch. 3] 
for its definition, (42 1 for the reversibility aspect, and a more general class of re- 
versible random-walk mutation models). Also, practically all models of nucleotide 
evolution that are in use in molecular population genetics, like the Jukes-Cantor, 
Kimura, Felsenstein, and HKY models, cf. 1441 or 1201 Ch. 13], are reversible. 
This property is particularly important in phylogenetic inference, where one relies 
on looking back from the present into the past. 

The paper is organized as follows. In Section |3 we will apply the Rayleigh- 
Ritz maximum principle to our class of matrices. This leads to a high-dimensional 
problem, which is hard to solve in practice. An example of how the problem may 
be reduced to a scalar one is given in Section [3] The main results are presented 
in Section 0] Here, we identify fairly general conditions under which the high- 
dimensional problem may be reduced to a low-dimensional variational problem 
that yields the leading eigenvalue up to an error term of order l/N, in the limit 
N = \S\ — > oo. Sections [5] and [6] are devoted to the lumping procedure. They 
show that a large class of models on a type space S arises, in a natural way, from 
models defined on a 'larger' space S, by combining several types in 6 into a single 
one in S. The general framework is set out in Section|5] and in Section[6] we apply 
it to the important case where 6 is the space of all sequences of fixed length over a 
given alphabet. Section[7]makes the connection back to the maximum principle and 
shows how the lumping procedure may lead to 'effective' models (on S) to which 
our asymptotic results may then be applied. The Hopfield fitness function, along 
with sequence space mutation, emerges as an example. In Setion 8, we summarize 
our findings and discuss them informally, and in a more biological context. 

2. The general maximum principle for reversible generators 

Let us first fix our assumptions and notation. Since we assume M to be an irre- 
ducible Markov generator, Perron-Frobenius theory, cf. [31] Appendix], tells us 
that it has a leading eigenvalue which exceeds the real parts of all other eigenval- 
ues, and an associated strictly positive left eigenvector tt. This will be normalized 
s.t. iTi = 1; then, tt is the stationary distribution of the Markov semigroup 
generated by M. 

We will assume that M is reversible, i.e., 

TTiMij = TTjMji (6) 
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for all i and j, which also entails 7^77^ = Kjflji since R is diagonal. Likewise, 
due to irreducibility, the leading eigenvalue, A max , of 77 is simple; we will en- 
counter the corresponding eigenvectors in due course. 

Let us note in passing that, due to reversibility combined with irreducibility, 
the equilibrium distribution ir of M is available explicitly as follows 1341 p. 35]. 
Let (vi,i>2, ■ ■ ■ ,v\s\) be the vertices of the directed graph G(M) (with (vi,Vj) 
a directed edge iff .\/, , > 0). Since Hi > for all i G 3, (Vj,Vi) is an edge iff 
(v.i, Vj) is, as a consequence of (|6j. Now, set w i = 1 and consider any 2 ^ I ^ 
By irreducibility, there is a directed path along v\ = Vk , Vk x , ■ ■ ■ , Ufe m = vi, 
which also exists as a path in reverse direction. If we now set 

TTi = ^i/iYljes ™i) ^ S tne stationary probability distribution of the Markov gen- 
erator M. This reflects the path independence of reversible Markov chains 1341 
p. 35]: For any path with an arbitrary number m + 1 of vertices (fco, fci, . . . , A; m ) 
in our graph G(M), the product n^=i(-^Af-i,fci/-^*j->fcj-i) onl y depends on the 
initial and final vertices, fco and k m , not on the path connecting them. Note that, if 
G(M) admits a Hamiltonian path, the calculation in @ can be further simplified 
by following such a path edge by edge. 

It is well-known that reversibility has important consequences for eigenvalues 
and eigenvectors of a Markov generator. An excellent exposition for the closely- 
related discrete-time case is Chapter 2.1 of |8|. Following these lines, we now 
define, for i ^ j, 

Fij : y/nMij — = Fji , (8) 

where the symmetry follows from the reversibility of M. Clearly, Fij ^ and 

= {FijFji) 1 / 2 = (MijMji) 1 ' 2 . As a consequence, the matrix 

H := n 1/2 Hn^ 1/2 (9) 

with 77 := diag{7Ti | i G S} has off-diagonal entries Fij, is symmetric and has real 
spectrum identical to that of 77, with correspondingly transformed eigenvectors. 
We now decompose 77 in the same way as we did with 77 in (0, namely into a 
Markov generator F plus a diagonal matrix E. To this end, let F = (Fij)ij e s 
with Fij as in for i ^ j, and complete this by Fa := — Y^jeS\U} ^j- With 

E t := Ri + M u - F u //. • ^ {2^/M~M~~ (M tJ + Mji)) , 

j>i 

one now has 77^ = E^ + EiSjj for all i, j G S, i.e., 

H = F + E (10) 
with F a Markov generator and E = diag{75; | i G S}. 
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This now allows us to formulate a suitable variant of the Rayleigh-Ritz (or 
Courant-Fisher) maximum principle for the leading eigenvalue of H , compare 1411 
Thm. 19.4]. Clearly, 

Amax = SUp ^ ViHijVj 

= SU P 2 _ ( Yl V ^ V i + H E k y t) . c 11 ) 

^Eies"?- 1 ijes fees 

where we have used the decomposition dlOt in the second step. Note that the supre- 
mum is, indeed, assumed, since the space of probability measures on S is com- 
pact. The maximizer, i.e., the normalized principal eigenvector of H, is unique 
and strictly positive (since the same holds for the corresponding eigenvector of 
H), so that the above may also be read as an L 1 variant through the substitution 
v % ■= v f- 

Note that, since F is a Markov generator, the quadratic form ^\ jgs v i^ij v j 
is negative semidefinite with maximum 0, which is assumed for the stationary 
distribution of F (since F is symmetric and irreducible, this is the equidistribution, 
and unique). We thus have a simple upper bound on A max : 

Amax < sup V" E k v 2 k = max E k , (12) 
v .y tl 2_ 1 f—t fees 

"■Lfes")" 1 fceS 

while we can obtain a lower bound for any v ^ with J^i v \ = 1 y i a 

^ ViFijVj + ^ E k V k ^ ^max ■ (13) 

i,jes fees 

Even though each step of the above derivation is elementary, it is worthwhile 
to summarize the findings as follows. 

Proposition 1. Let S be a finite set, and let H be an \S\x\S\-tnatrix with decompo- 
sition H = M + R into an irreducible and reversible Markov generator M and a 
diagonal matrix R. If n is the stationary distribution of M, H can be symmetrized 
to H = F1 1 I' 2 HF1~ 1 I' 1 with II = diag{-7Ti | i e S}. The matrices H and H are 
isospectral, and their leading eigenvalue A max is given by the maximum princi- 
ple ( II It . Furthermore, simple upper and lower bounds for A max are provided by 
Eqns. M21 and Jl 31 . □ 

It is our aim to identify conditions under which the inequality dl 2t becomes an 
equality, at least asymptotically as \S\ — * oo. 

As a first step, consider the maximizer of dl Q , i.e., the principal eigenvector 
w of H, normalized via X^ieS w i = Since H is a symmetric matrix, we have 
wH = X max w and, simultaneously, Hw T = A max w T . Hence, 

z T := c z n- 1/2 w T and h := c h wFI 1/2 (14) 

are the principal right and left eigenvectors of H = n~ 1 / 2 HLJ 1 / 2 . We will adjust 
the constants Ch and c z s.t. J^i hi = 12i ^i z i = 1' clearly, this implies c z ■ Ch = 1. 
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The vector h gives the stationary distribution of types in Equation 0. Further- 
more, it is well-known that, for irreducible H and t — > oo, the matrix exp(t(H — 
Amaxl)) becomes a projector onto h, with matrix elements Zihj (compare 1311 
Appendix]). Therefore, 

Um S jeg (cxp(*g)) i3 - = Ejesfihj = ^ ^ (15) 

With (J3 in mind, Z{ may therefore be understood as the asymptotic offspring ex- 
pectation of an i individual, relative to the mean offspring expectation of an equi- 
librium population. If R = CI for some constant C, we have Zi = 1, in line with 
the fact that H — CI is then a Markov generator. 

From dl4> . along with the normalization of h and z, the relations 

hi = 7r ' Z ' and wf = h l z l (16) 

are obvious. In particular, with 

a, := wj = hjZi > , (17) 

we obtain the corresponding L 1 -maximizer of i ll Q . 

To arrive at another interpretation of a, consider the Markov generator Q with 
elements 

Qij = Z i — KnaxSij)Zj . (18) 

It is easily confirmed that Q is indeed a Markov generator (i.e., Qij ^ for i ^ j, 
and J^j Qij — 0)- Using fl!6l > and reversibility, one observes that Q may also be 
written as 

Qij = h i 1 {Hji — X niax dij)hj . (19) 

In this form, Q is the generator of the backward process on the stationary distri- 
bution as described in 1301 Corollary 1] for general multitype branching processes, 
and used in 1261 in the context of mutation-selection models. Loosely speaking, 
Q describes the Markov chain which results from picking individuals randomly 
from the stationary distribution h and following their lines of descent backward in 
time. Eq. Jl 81 is the corresponding forward version as used in 1291 and 1231 . It is 
immediately verified that Q has principal left eigenvector (i.e., stationary distribu- 
tion) a. This is known as the ancestral distribution of types (as mentioned in the 
Introduction); its properties are analyzed in 1231 . Let us summarize this as follows. 



Proposition 2. Let the assumptions be as in Proposition 1. Then, the principal 
eigenvector w of H gives the principal left and right eigenvectors of H and their 
mutual relations through Eqns. (1141 and i\6\ . The L x -maximizer a = (ai),gs of 
Jl It admits the interpretation of an ancestral distribution as the stationary state 
of the backward Markov generator O of ( fT8t and ( fT9t . □ 
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3. A scalar maximum principle: An example 

The maximum principle (lilt is not very useful in practice if \S\ is large but 
fixed, since maximization is then over a large space. In 1 26 1 , this high-dimensional 
maximization could be reduced to a scalar one for special choices of M and R. 
We will re-derive this result here in a simplified way, which will also serve as 
an introduction to the more general methods and results we are aiming at. Let 
S = {0, 1, . . . , N} with the following mutation scheme: 



m 



N-l 



4-x 



N 



Suppressing the (relevant!) dependence on N in the notation, we then have 

M iti+1 = U+, M M _i = Uf (20) 

for i <G S, where we set = Uq = 0. This is a variant of the so-called single- 
step mutation model of population genetics 1 10 Ch. HI.4]. It emerges if sequences 
of sites (nuceotide sites or loci) are considered, and the 'type' is identified with the 
number of sites at which the sequence differs from a given reference sequence or 
wildtype; see 1431 for a recent application. If fitness is a function of this number 
only, and if mutations occur independently of each other in continuous time, we 
are in the setting of the single-step mutation model. 
Hence, for all i £ S, we have 

Fis+i = (M M+ iA/ l+M ) 1/2 = (U+lf- +1 ) 1/2 = F i+hi (21) 

with the obvious meaning for i = and i = N; also, := whenever either 
i or j is not in S, or if \i — j\ > 1. In order to evaluate the lower bound in (II 31 . 
let N be large, 1 ^ L <C N, and < £ S. We will use the simple test function 
v := (i>o, v\, . . . , vn) defined through 

fo, i$ (i + [-L,L])nS 
Vi ~ c ''\l, ie(£+[-L,L])r\S 

with [-L, L] := {—L, —L + 1, . . . , L — 1, L}, and the constant c e chosen so that 
Y^i^i = 1. That is, v is a normalized step function around I, which does not 
extend beyond or N. If I + [-L, L] C S, one always has c e = 1/(2Z + 1); a 
short calculation shows that, in any case, 

1 1 

s$ c, ^ 



2L + 1 L + l 

due to L -C N. With v i = vf, the quadratic form in an d i!3l reduces to 

^ ViFyVj = c e F t j = — c £ (F^„ Lj £_ L _ 1 + F f+L j +L+ i) , 

ides %,jet+[-L,L] 
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due to the tridiagonal nature of the Markov generator F. Since 

~(Ft-L,e-L-i + Fi+L.c+L+i) ^ maxFj. i+ i = maxFjj' =: F max , 
one has 

| 2^ ^ L + i ■ < 22 ) 

ijes 

On the other hand, the second term in resp. dl 31 (to be called the 'diagonal 
part' in what follows) becomes 

c+l 

J2 Eivl = c t £ (Ri U+ -Uf + ^JU+Ur +1 + ^/UrU^ ) , (23) 

ies i=e-L 

where Uf := is implied whenever i ^ S. 

Employing Landau's O-notation fJ] Ch. 1], we now assume that 

Uf = m ± (x,) +0(1/N) and Ri = r(x t ) +<D(1/N) (24) 

with continuous functions u + , u~ , and r on [0,1], and the new 'type variable' 
Xi = i/N; it is further implied that the constant in the 0(1/N) bound is uniform 
for all i. (Eq. ( 1241 differs from the scaling in 1261 by a global factor of N, which 
means nothing but a change of the time scale.) 

Define g(x) := u + (x) + u~(x) — 2y / u + (x)u~ (x), let x* be a point at which 
r(x) — g(x) assumes its supremum, and choose £ := [Nx*\. With an appro- 
priate scaling of L (such as L ~ y/~N, to be specific), the right-hand side of 
d22l is 0(1/ VN). In d23l . the sum has O(VN) terms, which is balanced by 
c ( = 0(1/ vN); together with J24i . this turns the right-hand side of d23l > into 
r(x*) — g(x*) + 0(1/N). At the same time, the upper bound in ill 2b also behaves 
like r(x*) - g(x*) + 0(1/N). Thus, the right-hand side of contributes the 
largest error term, so that we obtain the asymptotic maximum principle 

Amax = sup (r(x) - g(x)) (25) 
xe[o,i] 

up to 0(1/ VN), as N -> oo. 

Finally, recall from Section|2lthat, for finite N, the maximizer of (1 1 1 i is unique 
and given by the ancestral distribution a = (hiZiji^s- However, in the limit as 
N — > oo, uniqueness may be lost, which is also reflected by the fact that the 
supremum in d25i may be assumed at more than one point. It is these degenerate 
situations where error thresholds may occur 1261 . 

Remark 1. The maximum principle (I25> also holds for functions r and u ± with a 
finite number of jumps 1261 . This can be dealt with in the current framework with 
slightly more effort, but we avoid this here to keep the example as transparent as 
possible. 

Remark 2. With a more careful choice for the scaling of L, one gets the quadratic 
form (defined by the matrix F) down to ©(l/iV 1 ^) for arbitrary e > 0, but 
0(1/ N) is only obtained with the help of better (smooth) test functions. This will 
now be done. 
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4. An asymptotic maximum principle: the general case 

The maximum principle allows for an asymptotic estimation of the leading eigen- 
value when the Markov generator F can be considered as 'small' in a suitable 
sense, in comparison to the derived effective 'diagonal' part as defined by E. Be- 
fore stating precise conditions and results, let us briefly discuss the heuristics be- 
hind this. Due to the symmetry of F, we can rewrite Eq. (II \\ as 

A max = sup ( - - J2 Fij («< " "j f + £ E k v k) ■ ( 26 ) 



o 

'"■T.ies v e= 1 i,jeS keS 

Thus, it is obvious that the F-term favours constant v while the diagonal _E-part 
favours v that are concentrated on the points k where Ek is maximal. Clearly, 
the outcome of this competition depends on some concentration and smoothness 
properties of the matrices involved. 

For simplicity, let us now assume that our set S consists of integers or, more 
generally, d-tuples of integers. So, S C Z d , with |5| < oo. (It will become ap- 
parent later that this is not the most general choice possible, but a relevant and 
convenient one, with obvious extensions.) We will now look more closely into the 
situation where IS*! /* oo. Consider a family of sets 

S = S(N), S C Z d , so that \S\ ~ N d as N -> oo, (27) 

where we suppress once again the dependence of S on N. A reasonable setup is 
then obtained if -4 • S C D, where D is a compact domain in R d , i • S becomes 
dense in D for N — ► oo, and there exist functions E and from C%(D, R) (i.e., 
twice continuously differentiable with bounded second derivatives) with 

E i = E iV)+0(^] (28) 



N J \N 
and 

^ = hQ+oQ), (29) 

where k — j — i, and the constant in the 0(1/N) bound is uniform for all i and j. 
More generally, one can replace 0(l/N) in d28i and d29i by 0(l/rj(N)) for some 
function r/(N) that grows with N, if that better suits the individual situation. (Note 
that our notation is slightly abusive in that E denotes both the matrix defined by 
dlOl , and the function approximating its elements; however, the meaning is always 
obvious from the context.) 

Our main result will be the following theorem. For S C Z d , we will use 
throughout the shorthand notation S — i := {j — i | j 6 S}. 

Theorem 1. Assume that Ei and Fij are as in Eqns. J28I and \29\ . Assume further 
that the C\ (D, R) function E assumes its absolute maximum in int(D), and that 
f satisfies 



E 

fees- 



fk(^)\k e \k 2 m ^ C (30) 
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for some constant C, uniformly for all i G S, and 1 ^ £, m ^ d. Then, there exist 
constants $J C , C" < oo such that 

£(z*)-^<A max <£(z*) + ^ r , (31) 
where x* is a point where E{x) assumes its maximum. 

Remark 3. It will become clear when we proceed that the condition on the deriva- 
tives of E(x) and the fk(x) may be relaxed; it is indeed sufficient that these func- 
tions be continuous and locally C$, in a neighbourhood of x*. 

Note that the upper bound is clear in view of Eqns. d28l and (II 21 (recall that 
the quadratic form defined by F is negative semidefinite); it can be made sharper 
if the order of the approximation in (1281 and ( I29t is improved. It remains to prove 
the lower bound (which cannot be improved by sharpening the 0(1/N) in A28I 
and &29\). We will do so by evaluating the quadratic form in j26t for a sequence 
of test functions of Gaussian type centred around x* in the interior of D (and 
approaching a Dirac measure located at x* with increasing N). Specifically, we 
will use throughout 

v, := ce -<*mi/N-x'\ 2 with c = c ( iV ) s t ^ v\ = 1, (32) 



iGS 



where a > is a positive real number independent of N. 
We will first consider the diagonal part and show 



Proposition 3. Let E$ be as in d28l . and let x* be a point in the interior of D where 
E(x) assumes its maximum. Let the t>, be as in Eq. J32i . Then, 



J2EiV*=E(x*)+oQ) 



The upper bound in the proposition being immediate, we only need to prove 
the lower bound. We will use the following 

Lemma 1. Let g : M. d — > R^o be a non-negative, continuous, integrable function 
with g(x) ^ C / (l + \x\) d+e for all x, and (fixed) positive constants C and e. Then, 
for any x* G R d , 

lim -^j q( — — nx*\ = [ q(x)dx. (33) 

Proof. Note first that the sum in (I33l > exists for arbitrary, but fixed n due to the 
assumed decay condition for g. Let b n := X fe=1 (— l/2n, l/2n]. Then, one has 
M. d = [J ieZ d (i/n + b n ), and, for all x, there is a (unique) element 7 of Z d /n with 
x G (7 + b n )', this will be called j n (x). We now define 

g£(x):= sup g(z), g~(x) := inf g(z) . (34) 
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Since integration over R d is invariant under a shift of argument, and are step 
functions, we have 

/ 9n( x ) dx = I 9n (x- nx*)dx = \ V g~(i/n .- nx*) 

^ ~dYl 9{i/n-nx*) ^ \ ^ g+(i/n-nx*) (35) 



-I 



g^(x — nx*) dx 



g+(x) dx. 



Both g+ and g~ converge to g pointwise (since g is continuous). Furthermore, 
gti x ) are both bounded from above due to the properties of the assumed ma- 
jorizing function, and hence L d g~ (x) dx and f Rd g+(x)dx both converge to 
J Rd g(x) dx as n — > oo by the dominated convergence theorem. But then, the 
same must be true of the sum in d35l >. which proves the assertion. □ 

Corollary 1. For any non-negative integer k, and any a > 



lim Y 



N 



-aN\i/N~x*\ 2 



dx 



Proof. Use Lemma^with n = \/~N and g(x) 

Lemma 2. For any A C Z d , 8 > and k e N, 

k 



\ x \k e -a\x\ 



N (k-d)/2 \J_ _ 



\i/N-x*\^5 



-2aN\i/N-x*\ 2 _ qI 



iNS 2 



Proof. Just note that 



(36) 



□ 



(37) 



N (k-d)/2 ^ 

\i/N-x*\^5 



N 



-2uN\i/N-x*\ 2 



< -aNS 2 N (k-d)/2 \p |J_ 

| N 



-aN\i/N-x*\ 2 



(38) 



and apply Corollary^to the last expression to get the assertion. □ 

Corollary 2. Corollary\l\holds true with 1 d replaced by S(N) of d27t . 

Proof. Since x* € int(£>), we may choose a 8 > so that Z d \ S(N) C {i £ Z d : 
\i/N — x* | ^ <5}. Then, the difference in the sum in (1361 is ©(e^"^" 5 ), according 
to Lemma|2] with A = S(N). □ 
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Proof (of Proposition^. Since we may write 



N 



1 AT( fc - d )/ 2 |i/jV - x *\k e -2aN\i/N-x'\ 2 



N k / 2 



]yd/2^2 s e -2otN\i/N-x* 



Lemma|2]and Corollary|2]entail that, for k > 0, 



y I- 

\ N 



ieS(N) 
\i/N-x*\^S 



and 



y \- 



ieS(JV) 

|i/JV-a*|<(5 



uf = <D(e 



v] = O 



-aNS 2 



N k/2 



(39) 



(40) 



So far, we have only used that x* is in int(D). But x* is also a point where E(x) 
assumes its maximum, and E(x) is twice differentiable in a neighbourhood of x* . 
Hence, there exist 6 > and ^ C < oo, such that, for all [ar — a?* | < S, 
E(x) > E{x*) - C\x ~ x*\ 2 . Therefore, 

+ E E (^)"- 2 + E 



|i/JV-x*|<(5 



iGS 
\i/N-x*\^6 



^E{x*)(l + (D{e- aNs2 ))-C y 

lES 
\i/N-x*\<8 



\N 



O 



where we have used J28l > along with normalization in the first, ( I39l > in the second, 
and d39t and ( 1401 in the last step. This proves the assertion of Proposition|3] □ 

After dealing with the diagonal part, we are now ready to embark on the 
quadratic form. 

Proposition 4. Let Fij be as in ( 1291 . and assume that f satisfies condition i30\ of 
Theorem^} Then, 

yv^v^oQ] 



Proof. Evaluating the difference between | i/N—x* \ 2 = (i/N—x*,i/N—x*) and 
\j/N-x*\ 2 = (j/N-x*, j/N-x*), we first note that \j/N-x*\ 2 -\i/N-x*\ 2 = 

((i + j)/N — 2x* , (j — i) /N) (here, (. , .) denotes the scalar product). In view of 

v, = ce-aN^/N-x-^/N-x*)^ and with j = i + kt 



Vi > v i+k ^=> r)(i, k) := / 



^±±-2x* A\ >0 
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(note that rj(i, 0) = 0). Using Fij = Fji (see ©), (vi — Vj) 2 = (vj — Vi) 2 , and 
Fi.i+k = fk{i/N) + 0{l/N) (see \29V ). we can rewrite the quadratic form as 



E ViFijVj = ~7;E E F i,i+k( V i -V l+k ) 2 

E E F hi+k( V i ~ v i+kf 
i£S keS-i 
T](i,k)>0 

E E (a(^)+ (^))<- — 



2 

i.j'eS ies fces-i 



fej 



r;(i,A;)>0 

We have thus achieved that the summation includes only terms where Vi > Vi + k, 
which entails that 

„. _ Vi+k = ce -aN\i/N- x '\ 2 ^ _ e-aiV^i.fc)) ^ caNe - a N\i/N-x' fc j j 

since 1 — e~ x ^ min(x, 1) ^ x for a; (of which we only use the latter 
inequality). Together with the fact that the quadratic form is negative semidefinite, 
this gives 



> ~- E E F i<i+k( V i - v i+kf 

ies kes-i 



ies keS-i 

rj(i,k)>Q 



^-a 2 N 2 J2v 2 £ Uf*) +0 a))(r,(i,k)) a . (41) 



In the last step, the constraint on the sum could be removed since we added to 
the sum nonnegative terms only: fk(i/N) for k ^ (up to 0(1/N)), and 

(r)(i, k)) 2 ^ with equality for k = 0. 

We now note that J30i entails that, for 1 ^ £, m ^ d, 

E fk(^)hk m , E and E fk(^)k]k 2 m /N 

k£S-i keS-i keS-i 

(42) 

are all bounded from above by a positive constant C (the latter case relies on 
S/N C D with compact D). Writing 



' 171=1 



allows us to bound the various parts of the sum in (14 11 as follows: 
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ieS keS-i i,m=l 

m=l ieS 

where we used the Cauchy-Schwarz inequality for 

e,m=l 1=1 m=l 

d42l in the first, and ((39} and (0Qj in the last step. 
Again, with (El, 01, and {p, we obtain 

iSS £,m=l keS-i 

where we further used that X^=ilv/^ — a; J | ^ c|z/iV — x*\ for some positive 
constant c. Finally, (1421 also gives that 

£ a(s)#-°(s)- <45) 

ieS £,m=lk£S-i 

Combining ( 14-31 . J44i . and ( 145 > . we arrive at the assertion. □ 

Remark 4. Eq. d45i is the reason that the lower bound in ( Bit cannot be improved 
by better approximations in ( 1281 and ( I29> . 

Remark 5. We have, so far, assumed that a;* is in the interior of D. If x* is on the 
boundary of D, a similar approach may be taken with a one-sided, exponentially 
decaying test function. The error in the approximation will, however, be larger than 
in the case tackled here. 

So far, we have used the Rayleigh-Ritz variational principle il It to obtain 
results on the leading eigenvalue of H, but said nothing about the maximizer (note 
that the latter need not coincide with the test function v). Recall from Section [2] 
that, for finite N, the maximizer is unique and - in its L 1 version - given by the 
ancestral distribution a = (hiZi)i e s- Actually, from the bounds above, we can also 
conclude that a is concentrated in a neighbourhood of x* , where the width of the 
neighbourhood depends on the behaviour of E near its maximum. In the generic 
case of a quadratic maximum, a is concentrated in a region with a width of order 
1/VN. More precisely, we have: 
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Theorem 2. Let Ei and F^ satisfy the assumptions of Theorem^] Assume that E 
assumes its maximum at a unique point x* G int(_D), and that the Hessian of E at 
x* is negative definite. 

Then, there is a p > independent of N, so that, for every < [3 $J 1 and N 
large enough: 

]T a* < (3 , 

ies 

\i/N-x*\^p//3N 

where a is the ancestral distribution (of H7\ and Prop.]2§. 

Proof. Recall first that the (L 2 ) maximizer of il It is given by u; = (yfaVjieS (cf. 
(I17» . Hence, by Theorem[2 the negative semidefiniteness of F, and d28i . we have 



c 

E(x*) - — ^ A max = ^2 WiFijWj + 2J Eiwf 

i,jeS ieS .... 

1 (46) 

< £ £ 4 u> 2 < max^ = E(x*) + o(-) . 
ies 

Now, consider E(x) in a neighbourhood of x*. Since the Hessian at x* is negative 
definite, we have E(x) ^ E(x*) — C\x — x*\ 2 for some C > in a neighbourhood 
of x*, this being independent of N. For e small enough and 6(e) := yfeJC, 
therefore, 

fi^x*), \x-x*\ <8(s) 
{) ^ \E(x*) -e, \x-x*\^S(e). 
Together with d28i and J46i . this implies 

E(x*)+o(j-) =Y,E l w^E(x*)-s w "+°{^ 
ies ies 

\i/N-x*\^S(e) 

^E(x*) + o(± 

Hence, for some positive constant 7, 

< e w ^ 
ies 

\i/N—x*\^y/e/C 

for all sufficiently small e. Choosing e = j/flN and p = j/C gives the assertion. 

□ 

Remark 6. For notational simplicity, we have assumed above that E(x) assumes 
its (absolute) maximum at a unique point x*, which is the generic case. It is ob- 
vious from the proof, however, that an analogous result holds if the maximum is 
assumed at a finite number of points (each with a negative definite Hessian). Then, 
the ancestral distribution is concentrated on the union of the corresponding neigh- 
bourhoods of these points (or a subset thereof), again with widths of order 1 / \/~N. 
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Let us return to the case where E(x) assumes its (absolute) maximum at a 
unique point x*. We have seen that the ancestral distribution concentrates around 
x* for N — > oo, in the sense that any given fixed fraction 1 — (3 (or even more) of 
the distribution's mass is contained in a region whose width decreases with 1 / y/N. 
From this, we can further conclude that the mean ancestral type (in proper scaling), 
(^,-iai)/iV, converges to x*, which adds some interpretation to the maximum 
principle in Theorem^ More precisely, we have 

Corollary 3. Under the assumptions of Theorem^ we have 

1 



ies 



O 



as N oo. 



Proof. By the triangle inequality, and with a constant p as in Theorem|2] we have 



— a 

N 



ieS 



ies 



i 

N 

E 



ies 



x a. 



* — i 

jj-X |0,+ 



E 



\x*-i/N\<^p//3N 



ies 

\x*-i/N\^y/ p/pN 



N 



for all < (3 ^ 1. The first term is bounded by \J p/j3N by construction. Due 
to Theorem |3 and the fact that S/N C D with compact D, the second term is 
bounded by Cf3 for some positive constant C. Thus, 



y- 



ies 



J w +Cf3 



for all < 13 < 1 and N large enough. Choosing = (3(N) = 1/N 1/3 gives the 
assertion. □ 



Remark 7. So far, we have only considered the leading eigenvalue and the cor- 
responding eigenvector, in 'crudest' approximation order 1/N. Using more ad- 
vanced techniques from perturbation theory 1331 . it would be possible to obtain 
results on further eigenvalues and eigenvectors, as well as higher-order error terms. 



5. Lumping 

Let us now drop the specific assumptions of the previous section, return to the gen- 
eral situation in the Introduction, and reflect on the type space S, which has, so far, 
remained unspecified. In the example of Section|3] the types were defined in terms 
of some intermediate genetic level that could be derived from a more detailed pic- 
ture. In this Section, we will show that a large class of models on some type space 
S can be derived, in a natural way, from models defined on a 'larger' space & 
(to be called genotype space) if the branching and mutation rates satisfy certain 
symmetry or compatibility conditions. The idea rests on the common assumption 
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that fitness depends on the genotype through an intermediate level of 'effective' 
parameters (which may, for example, be 'phenotypes', or 'genetic values' in quan- 
titative genetics), and the mapping from the genotype to this intermediate level is 
multiple-to-one. One will therefore try and combine several of the genotypes into a 
single effective type; if this is also compatible with the mutation scheme, a reduc- 
tion of the number of dimensions is possible. In the theory of Markov chains, this 
approach is known as lumping 1351 Ch. VI]. We will proceed in two steps: First, 
the lumping procedure will be described in an abstract setting, with arbitrary geno- 
type and type spaces S and S, respectively. In a second step, we will specialize to 
the concrete sequence (or multi-locus) picture. 

For the first step, let S be a possibly large, but finite set. In analogy with Q, 
consider the dynamics 

p = P (M + 11) (47) 

on ]Rl s l, with M a Markov generator and 1Z = diagjTvler | a € &}. For this 
discussion, Ai need neither be irreducible nor reversible. 
Consider a mapping 

<p : & — > S = im{ip) (48) 
so that S may be understood as the disjoint union of fibres <£ m : 

e = U c ®m , with <P m := {it £ 6 | <p(a) = m} = y- l (m) . 

We will now give conditions under which the dynamics (I47> may be reduced to 
a dynamics on S. The following result is a variant of a theorem by Burke and 
Rosenblatt 1121 . see also 1351 Chapter VI]. The setting is illustrated in Figure|5] 

Theorem 3. Let & and S be finite, let ip be the mapping of J48i . and assume that 
there are matrices M = {M nm ) n _ m ^s and R = diagji?; | i G S} with 

n a = R v{(T) , for all g 6 6, (49) 
^ M-cr.T = M v (a),7m for all a G 6, raeS, (50) 

where Ai is the Markov generator ofEq. J47> . Then, M is a Markov generator on 
R |s| . If p solves (03, then 

y m ■= ^2 Pa (51) 

satisfies the differential equation Q, '•<?., y m = ^2 n yn(M nm + R n S n m)- If -M 
has stationary distribution tt = {^a)a£e, M has stationary distribution n = 
(^m)meS' where 7r m = X^e* ^ <J >' reversibility of M with respect to tt implies 
that of M with respect to tt. If M + 1Z has principal left eigenvector h, M + R 
has principal left eigenvector h with h rn = Xo-e* ^c- 
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Fig. 1. The lumping procedure. The 'large' space © is partitioned so that all elements in a 
given subset, say <P m , have the same reproduction rate R m (Eq. <49t ). and the same total 
mutation rate, XItg* Mcr,r, to elements in any other given subset & n (Eq. <50» . Then, 
each subset may be represented by a single element in a smaller space S, and the induced 
'effective' model on S is again a linear mutation-reproduction model. 



Proof. The proof is a straightforward verification. Note first that M is a Markov 
generator (on M) s \), because, for any a £ <P m , 

riSS neS re*™ t£6 

since M. is a Markov generator. 

Starting now from ( 15 1> and J47t . we find 

2/m = P<r = 2J X! Pr(-M TCT + "fc T (5 ra ) 

= /J Pr(-^y(r),m + -Ry(r)^y(-r),m) 

— ^ ^ Un[M nrrL -\- Rn&nm) ) 

where we have used ( I49> and ( 15 01 in the second step, and ( 15 1> in the last, together 
with the fact that both M v t T \ m and i? ¥ ,( T )^ ¥ ,( T ). m are constant on every fibre 

Finally, the assertions on stationary distributions and reversibility are direct 
verifications in the same spirit. □ 

6. From sequence space to type space 

In this Section, we will be more explicit and start from sequence space. The natural 
scheme that will emerge involves the grouping of sites together with a 'coarse- 
grained' dependence on some 'genetic distance'. Many of the frequently-used 
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models fall into this scheme. Related results appear in statistical physics, compare 
17 161 . from where we will borrow some techniques. 

Let us begin with the general setup for a mutation-reproduction model on se- 
quence space. We will assume that the type a of an individual is characterized by 
a (DNA, RNA) sequence which we take to be an element of the space & := £ 
with £ = {1, . . . , q}; we write a = (cti, . . . , <tn). For generality, we let q be an 
integer 2; if q = 2, the alternative choice £ = {— 1, 1} is often more conve- 
nient. Consider now a partition of the set of sites A = {1, . . . , N} into K disjoint 
subsets Ak, i.e., 

A = \\ A k . (52) 

Let V{£) = {(/xi, . . . , | m ^ 0, /Xf = 1} denote the simplex of probabil- 
ity measures (or vectors) on £. Set, with obvious meaning, 

and 

K 

V (Ali ..., AK) (£) = (g)V Ak (£). (53) 
fc=i 

That is, T d [a 1 ....,a k ){^') is the set of product measures with values restricted to 
certain rationals induced by the partition. 

Consider now the mapping (which will take the role of ip from the previous 
section) 

m : £ N — ► Q Kq , o i — ► m((j) (54) 
withm(cr) = (mf(cr))^ fe ^ and 

So, m|(cr) is the fraction of the sites in Ak which are in state I 6 £. Note 
that Yl\=i tttfcO 7 ) = 1, i.e., for each fc, mfc(cr) := (tn^(cr), . . . , m q k (a)) defines a 
probability measure on £, with xrik G (^)- 

Describing the system in terms of these lumped quantities will only lead to a 
simplification if a suitable symmetry is available. In our case, this is given by those 
permutations of the sites that are compatible with the chosen partition. 

Let r A be the permutation group on A = {1, ... , N}, i.e., 

r A := {7 I 7 : A — ► 7l is a bijection} , 
and Pm^. >i K ) the subgroup compatible with the partition ( 15 2> . i.e., 

^ 1> ...,^) = {7 e -T A I 7 (A fc ) = 1 < k < A"} ~ r Al x • • • x r Ax . 

We introduce the canonical action of the permutation group on £ N through the 
inverse permutation of sites, i.e., (7c) j = a^-iu). We are now ready for 
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Theorem 4. Let S N = {1, ...,q} N , and matrices M. = (M a ,T)a,Tes N an d 
1Z = diagjT^cr | a G S N } be given, with M. a Markov generator. Let p solve 
p = p(M + 11). Furthermore, let m be as in dSl . and S := m(S N ) C Q Kq . 

Assume that there exist a function g : £ x S N > Kj>o> an d matrices AI = 

(M mn ) m ng £ and R = diag{i?„ | n G 5*}, so that the following conditions are 
satisfied: 

(a) g(jr,ja) = g(r,a), for all 7 G r^ u ...,A K )> 

(b) M aT = ^m( CT ),m(r)5( cr , T ), M all <7,T G £ N J 

(c) = -R w(<y)j for all a ^E N . 

Then, y m := X^o-e* so ^ ves me differential equation y = y(M + i?), where 

is independent of the choice of a G Moreover, M is a Markov generator. If 
M. has stationary distribution n = (Tr^o-ee. M has stationary distribution n — 
(^m)meS' where 7r m = X^e* re v ers ibMty of M. with respect to tt implies 
that of M with respect to ir. If A4+lZhas principal left eigenvector h = {h a ) a ^e, 
then M + R has stationary distribution h = (hm) mG § with h m = X)ere<s 

Proof. For 7 G r (Ali ...^), we have 

m(7cr) = m(o-) and j{S N ) = S N , (56) 

where the first identity is obvious from J55t . Equation d56l > entails that 

7(^m)=^m, (57) 

i.e., il^ jij^) acts transitively on <£ m . 

In order to apply Theorem ^ we have to check assumption < !50b . Consider 
therefore E T6 <P m = M m(<T)iTO E re <2> m fffo t). For arbitrary 7 G r {Au _ <AK) , 
assumption (a) and Eq. d57t give 

V'(cr) := ^ g((7, t) = ^2 5(7c r ,7 T ) 

Due to the transitivity of -T(Ai,...,y1 A ') on ^*m> VK ") i s constant on the fibres 
^m((r)- Assumption J50i is therefore valid, and an application of Theorem^then 
gives the desired result. □ 

Remark 8. The connection with the situation in Section |4] is made by setting d = 
Kq, and observing that S/N C [0, l] d =: D. Obviously, S and D must take the 
roles of S and D. If \Ak\ ~ ctkN with positive constants a^, 1 < k < K, and 
J2k a k = 1> then £ becomes dense in D as N — > 00. The corresponding is a 
parallelepiped with edge lengths a/-. 
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Examples of particular relevance emerge if g is a iV^ /i /f ) -invariant dis- 
tance, such as the Hamming distance (i.e., the number of sites at which two se- 
quences differ). A very simple case was implicit in Section where the single- 
step mutation model on S = {0, 1, . . . , N} was interpreted in terms of a model on 
{ — 1, 1} N . Here, a site in state +1 or —1 corresponds to a site whose state does 
or does not coincide with the respective state in a reference sequence (sometimes 
called the 'wildtype'). If the reproduction and mutation rates only depend on the 
Hamming distance from the reference sequence, we are in a setting with K = 1, 
q = 2 and hence d = 2, which further boils down to d = 1 if the restriction 
m} + mf = 1 is used (see also below). In such a simple case, the lumped model is 
immediate. More elaborate examples will be discussed in the next Section. 



7. Towards Applications 

In many examples of sequence space models, the lumping construction as de- 
scribed in the previous sections leads to an effective model to which the maximum 
principle of Section[4]may then be applied. In particular, a given example will be 
a case for Theorem^if it has the following properties: 

(PI) The partition {Ak}^—-y in d52l > is relatively uniform, in the sense that there exist 
constants < c ^ C < 1 such that 

^ • , \A k \ , \Ah\ , n 

c inf — — - < sup — — C 

N i^k^K N 

uniformly in N. (Alternatively, this may be replaced by the single, and slightly 
weaker, condition lim inf at^oo inf i^fe^x J^l- > 0; note that X^-I^fel = -^by 
construction.) This condition ensures that Xi = i/N will become a meaningful 
continuous type variable for iV — > oo. 

For the next two properties, a suitable enumeration of the elements of S is required 
to ensure an appropriate representation of the matrices M and R. 

(P2) The function g that occurs in the sequence space mutation matrix and that is 
required in the lumping procedure (see Theorem^ decreases sufficiently fast 
away from the diagonal. Note that under condition (PI), for any a, r we have 
that 

N 

dn{o,T) ^ — ||m(o-) - m(r)||i , 

where c?h is the Hamming distance. Thus, if g has compact support indepen- 
dent of (as in the example in Section[3}, or if it decays sufficiently fast (e.g., 
exponentially) with dn, this entails the decay condition on / in Theorem^ 
(P3) After lumping, the effective reproduction and mutation matrices R and M lend 
themselves to a continuous approximation. That is, R m = r(m) + 0(1/N) 
and M m n = s(m,n) + 0(1/N) with functions r and s that are C^AM), 
where the implied constant in the 0(l/N) bound is uniform for all m and n. 
This entails the approximation condition on E and F in (12 8 1 and d29t that is 
also required for Theorem[2 
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Clearly, (P2) and (P3) stipulate that the enumeration of the types is adapted to 
the problem. The right choice is often intuitively clear, as in the examples in Sec- 
tion|5] and in |2l 1. But sometimes more thought is required, as will be illustrated 
by means of a few examples and special cases below. 

(El) Some simplifications arise in the case q = 2, where we now use S = {— 1, 1} 
rather than {1, 2}. Here, the constraint m^. + m\ = 1 can be used to reduce the 
number of variables per subset to one. It is convenient to set bk = rrijf, — mj;. 
Eq. J53i is then replaced by 

K 2 2 

V [Au „„ Ak) {E) = 0{-l, -l + _,..., 1 - 1} , 

and we obtain the simple formula 

*t. 'Ik 



(E2) The case d = 1 (and hence S C Z) corresponds to so-called 'mean field 
models'. They have been studied in the case where g(a, r) = for dn(a, r) > 
1. i.e.. mutation is restricted to neighbours in sequence space (see I3I4I47I5I 
Eg) for q = 2, and 12712 II for q = 4). 

(E3) A special type of models that falls into the above class is related to fitness land- 
scapes based on Hopfield Hamiltonians. These are special cases of spin-glass 
models 1391 that were originally motivated by neural networks, then became 
prototype models for random interactions in statistical physics, and were later 
also used as tunably rugged fitness landscapes in biology |38 45). 
Let us consider the case q = 2, with sequence space & = £ = { — 1, 1} N . 
A Hopfield Hamiltonian then is a function that assigns to every a 6 & an 
energy TLn{<j) in the following way: Al elements £ A/ of S N (known 

as patterns) are specified (usually by independent random draws from E N ). 
Given these, one defines 



where 



2 



(58) 



1 N 1 

w » : =]vE^ = /v (CT,e) ' (59) 

i.e., a sequence is assigned an energy by sitewise comparison of the sequence 
with all patterns (see Fig. 2). The properties (in particular, the ruggedness) of 
the energy landscape so defined (and to be used to assign fitness, see below) 
depends on the number and the particular choice of the patterns. 
Let us now explain the lumping procedure for 7ijv(c) I as adopted from ]6) 
and illustrated in Fig. 2 (the more general setting with q > 2 can be found 
in 1221 '). To this end, we associate with the collection of row vectors £ M the 
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Fig. 2. Lumping in a Hopfield model with M = 2. Here, £\£ 2 G {-1, 1}^ are two 
reference sequences ('patterns'). Fitness is assigned to a sequence a — (ai, (T2, . . . , (tn) £ 
{ — 1, 1}^ by sitewise comparison with both patterns (Eqs. 1581 . <59> . and J6 II ). This de- 
fines four subsets of sites (indicated by different shadings) so that the sites in each subset 
are equivalent with respect to both £ x and £ 2 and may thus be permuted without a change 
of fitness. 



M x TV matrix £ = {Cs)i%^n • We denote bv ^ the rows and b Y 6 the 
columns of this matrix. A partition Ai, . . . , Ak with K < 2 M is now obtained 
as follows. Let ei, . . . , e 2 M (e fe = (e^) 1 ^'*^ Af ) denote an enumeration of all 
T\/-dimensional column vectors with entries ±1. Then we set 

A k := {s G A | & = e fc } . 

If all the Ak are non-empty, K = 2 M ; otherwise, empty subsets may be omit- 
ted, and K < 2 M . We then have 

= ]v"E e £ E = ^Ekkmo, 

fc=l s6/l fc fe=l 

and so 

M X 

H N (a)=Nj2 E e^|A fc p,|6 fc (a)6^((7) 

is a function of the bk (c). Thus, if we consider reproduction and mutation rates 
of the form 

M aT = a(H N (a),H N (T)) g(a,r) , (60) 
= P(n N (a)) , (61) 

with a nonnegative function a and any real function j3, we may apply Theo- 
rem |4] to derive the effective dynamics with lumping according to the values 
of bfe(cr). In particular, the choice f3(x) = x gives the familiar Hopfield fit- 
ness landscape, and a(x) = 1 along with g(<7,r) = /i for djj(cr, r) = 1, 
<?(cr, r) = for djj(cr, r) > 1, and g{cr,(j) = — 2TV/i yields the decoupled 
sequence space mutation model where every site mutates independently and 
at the same rate /i (e.g., 0). It may be considered as the decoupled variant 
of the quasispecies model 1171 : the latter may be constructed in a similar way. 
Both are mutation-selection models in a molecular setting and well known for 
their error thresholds that may occur when /x surpasses a critical value. A pre- 
liminary analysis of sequence space mutation-selection models with Hopfield 
fitness has been given in (3j^45) and shows a rich behaviour, with various error 
thresholds, depending on the specific choice of patterns. 
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8. Summary and Discussion 

The motivation for this work came from haploid mutation-selection models, or 
other essentially linear models, which frequently appear in population biology. 
These are models for relative frequencies of types (genotypes, age classes...) in a 
population, which turn linear after a suitable transformation to quantities that may 
be interpreted as the absolute frequencies that would be obtained if growth were 
unrestricted. 

We have been mainly concerned with the leading eigenvalue of the matrix that 
describes this linear dynamics. This leading eigenvalue is the key to the asymptotic 
properties of the corresponding essentially linear model. For example, it directly 
yields the mutation load in a mutation-selection model. It also provides the key to 
the stationary distribution of types in the present as well as the ancestral population 
(the latter is obtained when running the process backward into the past until sta- 
tionarity is reached). Furthermore, its parameter dependence determines whether 
error thresholds occur in a given system. 

We have considered here the large class of models with a reversible muta- 
tion part, meaning that, in the (hypothetic) mutation equilibrium it in the absence 
of reproduction, the mean number of transitions between any pair of types is the 
same in the forward and the backward direction. This is a standard assumption in 
many models of population genetics. Note that any symmetric mutation generator 
is automatically reversible (because tt is then the equidistribution). Many mutation 
models of classical population genetics are reversible (like the random-walk mu- 
tation model with Gaussian mutant distribution 1 10 42 1), and the same holds for 
practically all models of nucleotide evolution, as discussed already in Section ^ 
At the molecular level, reversibility is a basic assumption on which practically all 
model-based phylogenetic tree estimation methods rest. 

Reversibility implies that the matrix H that governs the linear(ized) dynamics 
is similar to a symmetric one, which in turn means that its leading eigenvalue may 
be determined by the Raleigh-Ritz variational principle. But this alone is not very 
useful in practice if the number of types is large, which is the usual situation in 
all but a few textbook examples. The main concern of this paper, therefore, was to 
reduce the number of dimensions to its 'effective' number. This involved two steps: 
A lumping procedure that leads to an equivalent smaller, still discrete, system; and 
an approximation that turns the discrete system into a continuous one by replacing 
the discrete types by a continuous type variable. Let us discuss them in turn. 

Lumping: This a kind of coarse-graining that applies if the fitness function and 
the mutation model on the 'original' (genotype) space 6 have enough symmetries 
to allow for lumping of several states of & into a single one, so that the induced 
'effective' model on a smaller space S is again a mutation-reproduction model. As 
illustrated in Fig. 1, this works if 

1 . for every state m in S, all states a G S that are lumped into it (i.e., all elements 
of the fibre <P m ) must have the same fitness, R m (Eq. J49» . and 

2. for every element a E <P m , the total mutation rate to 'target types' in <P n , 
i.e., J2te<p M- a ,T, must be the same; it may depend on n and m, but not on 
which particular element a € <P m is considered. Note, however, that only the 
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total mutation rates are relevant, not how they are distributed across the various 
types in <P n ; see Eq. i5Q\ and Fig. 1. 

Well-known examples that allow for lumping are evolution models on se- 
quence space E N , the set of possible sequences of length TV over an alphabet 
E (e.g., E = {A, G, C, T} or E = {-1, 1}), provided all sites mutate inde- 
pendently and according to the same process, and the fitness function is invariant 
under permutation of sites. Independent mutation is a perfectly natural standard 
assumption; permutation invariance of fitness is more of a restriction, but still a 
common assumption. It applies, for example, if fitness only depends on the se- 
quence through the number of mutated positions (i.e., the Hamming distance) rel- 
ative to the wildtype or some other reference sequence. Specifically, the fitness of 
regulatory sequences has been modelled as a hyperbolic function of their binding 
energy to the regulatory protein, which, in a good approximation, depends linearly 
on the number of mismatches relative to the perfectly matching sequence 1241 . 
Then, S = {0, . . . , N} d with d = \E\ is the obvious choice, where the elements 
of S are given by i = (ie)ees with %i denoting the number of sites occupied by 
letter I. In fact, d = \E\ — 1 is also sufficient due to the constraint J^ees ^ = N. 
If E = {— 1, 1} and if we assume parallel mutation and selection, we arrive at a 
special case of the single-step mutation model in Section|3] Namely, on E N , the 
non-diagonal elements of the mutation generator are M. a ,T = p/N if a and r 
differ at exactly one site, while all other elements vanish; on S = {0, 1, . . . , TV}, 
we get 

TV — i i 
U+ = p — — — = M M+ i and Ur = Hj^ = M^_ x (62) 

as the 'lumped' mutation rates (since TV — i and i, respectively, are the number of 
ways in which a sequence with i mismatches may mutate into one with i + 1 or 
i — 1 mismatches in one step). 

For simple situations like this one, the above lumping according to the Ham- 
ming distance is routinely used, one way or another (see, e.g., 1401241 1. It is also 
implicit in many multilocus models; here, the original genotype is usually not con- 
sidered at all, and one entirely relies on some effective model as identified with the 
number of mutated sites relative to some wild- or optimal type, see 1361141 . 

With somewhat more effort, models with a nucleotide alphabet may be treated 
along the same lines 1211 . this time, with d = \E\ — 1 = 3. What is less obvious 
is that the procedure also works for more interesting fitness functions like those 
that arise from Hopfield models. Here, again, E = {—1, 1}, but, this time, fitness 
is assigned according to the sitewise comparison of the sequence with several ref- 
erence sequences (known as patterns). Such fitness functions have multiple peaks, 
are tunably rugged, and fail to be permutation invariant across all sites. Rather, the 
set of sites A = {1, 2, ... , TV} may be partitioned into d — K (disjoint) subsets so 
that the sites in each subset are equivalent with respect to all reference sequences. 
Consequently, permutation invariance still applies within subsets, and the effec- 
tive type now is a d-tuple of letter frequencies, each taken over the sites in a given 
subset. For details, see Section^ and Fig. 2. 

Continuous approximation: Even after lumping, the state space is usually large, 
typically S = {0, 1, . . . , TV} d with large TV and moderate d. In a second simplifi- 
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eation step (that may, of course, be applied independently if the model was on S 
in the first place), we now replace the discrete variational problem by a continuous 
one on a compact domain D £ M. d . As described in Section the discrete type 
i £ S is replaced by Xi = i/N in S/N, and approximated by a continuous vari- 
able x in the limit N — > oo. For the two-state model discussed above, x £ [0, 1] is 
simply the fraction of mutated sites relative to the reference sequence. (In popula- 
tion genetics, the infinite sites limit N — > oo at constant i (and hence i/n — » 0) is 
more familiar; for a discussion of how this relates to the limiting procedure here, 
see 1261 and 0). For models with a nucleotide alphabet, x £ [0, l] 3 tells us at 
which fraction of the sites there is a replacement of the reference letter by one of 
the three other nucleotides (in a suitable encoding). Finally, in the Hopfield model, 
x £ [0, l] d holds the fractions of sites that read '+1' within the d subsets. 

Our main result, Theorem [2 now rephrases the variational problem in terms 
of matrices E and F that result from symmetrization of M, and hence of M + R. 
F is the symmetrized mutation matrix, as far as the non-diagonal elements are 
concerned; its diagonal elements are arranged so that F is a Markov generator. E 
is a diagonal matrix that holds both the reproduction rates and contributions from 
the mutation rates. 

Theorem now tells us that, under certain conditions on E and F, a large 
simplification relative to the discrete problem is obtained: The variational problem 
boils down to a continuous one on D C K d . If d is small, this can often be solved 
explicitly. Let us now first discuss these conditions, and then the result, in more 
detail. 

The assumptions on E and F in d28i . j29t and J30l > appear to be rather special, 
but they are, in fact, very natural for many models in population genetics. The 
continuous approximation of the matrices E and F, as imposed by d28b and i29\ . 
always applies if the reproduction and mutation rates have their own continuous 
approximations each (i.e., R4 = r(i/N)+0(l/N) and My = u k (i/N)+0{l/N) 
with C^(D, M.) functions r and Uk for all i,j, where k = j — i) as in the single-step 
mutation model (Section[3]and Eq. i62\ ). For lumped versions of sequence space 
models, the condition on the mutation part is always fulfilled; often, the continuous 
version is even exact, i.e. without the 0(1/N) term, as we see from i62\ . As to the 
reproduction rates, the condition requires that the fitness function becomes locally 
smooth when the types become continuous (but this does not exclude ruggedness 
at a larger scale). Many models in population genetics rely on this assumption, in 
particular, the usual models of quantitative genetics (for review, see [10|). 

Furthermore, F must decay sufficiently fast away from the diagonal (Eq. J30» . 
If we have a suitable distance between types and mutation decays fast enough 
with distance, then, with a suitable indexing, the symmetrized mutation matrix F 
will be concentrated around its diagonal. In the single-step mutation model, M 
is tridiagonal, and hence J30l > is trivially satisfied. In many other models (such as 
the random walk mutation model with Gaussian mutant distribution), the decay is 
exponential and hence even faster than the cubic decay required in d30i . 

Under the conditions just discussed, it turns out that the remaining variational 
problem involves only the diagonal term E; F contributes only an 'irrelevant' 
0(1/ N) term. The maximum of the continuous function E(x) that approximates 
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the entries of E then yields the leading eigenvalue, or mean fitness, in leading 
order. For the single-step mutation model (d = 1), E(x) is easily seen to be 
E(x) = r{x) — g(x) (cf. (I25» . where r is the (continuous approximation of) 
the fitness of type x, and g(x) has a plausible interpretation as loss in fitness due to 
mutation 1261 . The explicit expression for E(x) is immediate in this case since the 
mutation matrix is tridiagonal. In nontrivial examples, however, more work is re- 
quired to get this function explicitly; examples will be presented in a forthcoming 
paper. 

In the generic case that E(x) has a unique, quadratic maximum, we can further 
say that the ancestral distribution is concentrated around the point x* at which 
E{x) assumes its maximum. More precisely, any given fraction of at least 1 — j3 
of the distribution's mass is contained in an interval centred at x* whose width 
decreases as l/\/~N (Theorem|2j. As a consequence, x* obtains the interpretation 
of the mean ancestral type, up to an error term of the order of at most 1/iV 1 / 3 
(Corollary OJl. 

Open questions concern the stationary distribution in the present population, 
and quantities associated with it. In the single-step model, the mean type of the 
present population is available through the inverse function of r evaluated at A max 
(if r is monotonic); this also leads the way to other properties of the distribution, 
in particular, the variance of the present type, and the variance in fitness |26|. This 
does, however, not carry over to higher dimensions in a simple way - the present 
seems to be more difficult to deal with than the past! For the same reason, the 
criteria for the existence of error thresholds given in 1 26 1 remain to be generalized. 

The motivation for this work came from continuous-time mutation-reproduction 
(or mutation-selection) models (cf. Q, and @), which also set the scene for 
this discussion. However, it should have become clear that our results are not tied 
to these specific models. Our main result (Theorem^ simply yields asymptotic 
estimates for the leading eigenvalues of large matrices that possess a certain con- 
tinuous approximation, and whose elements decay sufficiently fast away from the 
diagonal. These properties are shared by many dynamical systems (in discrete 
and continuous time); obvious candidates are models with migration and spatially 
varying growth rate (see 1371 Chap. II] for an overview of spatially structured pop- 
ulation models). 



Acknowledgements. Support and hospitality of the Erwin Schrodinger International Insti- 
tute for Mathematical Physics in Vienna, where this work was initiated and partly carried 
out, is gratefully acknowledged. A.B. also thanks the German Research Council (DFG) for 
partial support in the Dutch-German Bilateral Research Group "Mathematics of random 
Spatial Models from Physics and Biology". It is our pleasure to thank T. Garske and P. 
Blanchard for helpful discussions, and R. Burger and an anonymous referee for reading the 
manuscript very carefully and providing valuable suggestions for improvement. 

References 

1. Akin, E.: The Geometry of Population Genetics. Lect. Notes Biomath., vol. 31, 
Springer, Berlin (1979). 

2. Athreya, K. B., Ney P. E.: Branching Processes. Springer, New York (1972). 



30 



Baake et al. 



3. Baake, E., Baake, M., Wagner, H.: Ising quantum chain is equivalent to a model of 
biological evolution. Phys. Rev. Lett. 78, 559-562 (1997), and Erratum, Phys. Rev. 
Lett. 79, 1782 (1997). 

4. Baake, E., Baake, M., Wagner, H.: Quantum mechanics versus classical probability in 
biological evolution. Phys. Rev. ESI, 1191-1192 (1998). 

5. Baake, E., Wagner, H.: Mutation- selection models solved exactly with methods from 
statistical mechanics. Genet. Res. 78, 93-117 (2001). 

6. Ben-Arous, G., Bovier, A., Gayrard, V.: Glauber dynamics of the random energy 
model. 1. Metastable motion on the extreme states. Commun. Math. Phys. 235, 379- 
425 (2003). 

7. Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability in stochastic dynam- 
ics of disordered mean-field models. Probab. Theor. Rel. Fields 119, 99-161 (2001). 
cond-mat/98 11331 

8. Bremaud, P.: Markov Chains. Gibbs Fields, Monte Carlo Simulation, and Queues. 
Springer, New York (1999). 

9. de Bruijn, N.G.: Asymptotic Methods in Analysis. 3rd ed., corrected reprint, Dover, 
New York (1981). 

10. Burger, R.: The Mathematical Theory of Selection, Recombination, and Mutation. Wi- 
ley, Chichester (2000). 

11. Bulmer, M.: Theoretical Evolutionary Ecology. Sinauer, Sunderland (1994). 

12. Burke, C, Rosenblatt, M.: A Markovian function of a Markov chain. Ann. Math. 
Statist. 29, 1112-1122 (1958). 

13. Caswell, H.: Matrix Population Models: Construction, Analysis, and Interpretation. 
2nd ed., Sinauer, Sunderland (2000). 

14. Charlesworth, B.: Mutation-selection balance and the evolutionary advantage of sex 
and recombination. Genet. Res. Camb. 55, 199-221 (1990) 

15. Crow, J.F., Kimura, M.: An Introduction to Population Genetics Theory. Harper & 
Row, New York (1970). 

16. Durrett, R.: Probability models for DNA sequence evolution. Springer, New York 
(2002). 

17. Eigen, M., McCaskill, J., Schuster, R: The molecular quasi-species. Adv. Chem. Phys. 
75, 149-263 (1989). 

18. Ethier, S.N., Kurtz, T.G: Markov Processes - Characterization and Convergence. 
Wiley, New York (1986). 

19. Ewens, W.: Mathematical Population Genetics. 2nd ed., Springer, New York (2004). 

20. Ewens, W., Grant, R: Statistical Methods in Bioinformatics. Springer, New York 
(2001). 

21. Garske, T., Grimm, U.: A maximum principle for the mutation- selection equilibrium 
of nucleotide sequences. Bull. Math. Biol. 66, 397-421 (2004). physics/0303053 

22. Gayrard, V.: The thermodynamic limit of the g-state Potts-Hopfield model with in- 
finitely many patterns. J. Stat. Phys. 68, 977-1011 (1992). 

23. Georgii, H.-O., Baake, E.: Multitype branching processes: the ancestral types of typical 
individuals. Adv. Appl. Prob. 35, 1090-1110 (2003). math. PR/03 02049 

24. Gerland, U., Hwa, T: On the selection and evolution of regulatory DNA motifs. /. 
Mol. Evol. 55, 386-400 (2002) 

25. Hadeler, K.P: Stable polymorphisms in a selection model with mutation. SIAM J. 
Appl. Math. 41, 1-7 (1981). 

26. Hermisson, J., Redner, O., Wagner, H., Baake, E.: Mutation-selection balance: 
Ancestry, load, and maximum principle. Theor. Pop. Biol. 62, 9^16 (2002). 
cond-mat/0202432 

27. Hermisson, J., Wagner, H., Baake, M.: Four-state quantum chain as a model of se- 
quence evolution. J. Stat. Phys. 102, 315-343 (2001). cond-mat/0008123 

28. Hofbauer, J.: The selection-mutation equation. J. Math. Biol. 23, 41-53 (1985). 

29. Jagers, P.: General branching processes as Markov fields. Stoch. Proc. Appl. 32, 183- 
242 (1989). 



Asymptotic maximum principle 



31 



30. Jagers, P.: Stabilities and instabilities in population dynamics. J. Appl. Prob. 29, 770- 
780 (1992). 

31. Karlin, K.S., Taylor, H.M.: A first course in stochastic processes. 2nd ed., Academic 
Press, San Diego (1975). 

32. Karlin, K.S., Taylor, H.M.: A Second Course in Stochastic Processes. Academic Press, 
San Diego (1981). 

33. Kato, T: Perturbation Theory for Linear Operators, reprinted ed., Springer, New York 
(1995). 

34. Keilson, J.: Markov Chain Models - Rarity and Exponentiality. Springer, New York 
(1979). 

35. Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, New York (1981). 

36. Kondrashov, A.S.: Deleterious mutations and the evolution of sexual reproduction. 
Nature 336, 435^140 (1988). 

37. Kot, M.: Elements of Mathematical Ecology. Cambridge University Press, Cambridge, 
2001. 

38. Leuthausser, I.: Statistical mechanics of Eigen's evolution model. J. Stat. Phys. 48, 
343-360 (1987). 

39. Mezard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scien- 
tific, Singapore (1987). 

40. Nowak, M., Schuster, P.: Error thresholds of replication in finite populations. Mutation 
frequencies and the onset of Muller's ratchet. J. Theor. Biol. 137, 375-395 (1989). 

41. Prasolov, V.V.: Problems and Theorems in Linear Algebra. AMS, Providence, RI 
(1994), corrected reprint (1996). 

42. Redner, O.: Discretization of continuum-of-alleles models. Proc. Edinburgh Math. 
Soc, in press (2004). math.SP/0301024 

43. Rouzine, I.M., Wakeley, J., Coffin, J.M.: The solitary wave of asexual evolution. PNAS 
100, 587-592 (2003). 

44. Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M.: Phylogenetic Inference, in: 
D. M. Hillis, C. Moritz and B. K. Mable, eds., Molecular Systematics, Sinauer, Sun- 
derland (1995), pp. 407-514. 

45. Tarazona, P.: Error threshold for molecular quasispecies as phase transition: From 
simple landscapes to spin glass models. Phys. Rev. A 45, 6038-6050 (1992). 

46. Thompson, C.J., McBride, J.L.: On Eigen's theory of the self-organization of matter 
and the evolution of biological macromolecules. Math. Biosci. 21, 127-142 (1974). 

47. Wagner, H., Baake, E., Gerisch, T: Ising quantum chain and sequence evolution. J. 
Stat. Phys. 92, 1017-1052 (1998). 



